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Oh' Abstract 

' Canalizing functions have important applications in physics and biology. For exam- 

ple, they represent a mechanism capable of stabilizing chaotic behavior in Boolean 
network models of discrete dynamical systems. When comparing the class of canal- 
izing functions to other classes of functions with respect to their evolutionary plau- 
^ ' sibility as emergent control rules in genetic regulatory systems, it is informative to 

know the number of canalizing functions with a given number of input variables. 
This is also important in the context of using the class of canalizing functions as a 
constraint during the inference of genetic networks from gene expression data. To 
this end, we derive an exact formula for the number of canalizing Boolean functions 
of n variables. We also derive a formula for the probability that a random Boolean 
function is canalizing for any given bias p of taking the value 1. In addition, we 
consider the number and probability of Boolean functions that are canalizing for 
exactly k variables. Finally, we provide an algorithm for randomly generating canal- 
izing functions with a given bias p and any number of variables, which is needed for 
Monte Carlo simulations of Boolean networks. 
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1 Introduction 



A Boolean function (on n variables) is a function / : {0,1}" — > {0,1}. A 
canalizing function (also called a forcing function) is a type of Boolean function 
in which at least one of the input variables is able to determine the function 
output regardless of the values of the other variables. For example, the function 
f {xi,X2,X3) = Xi + X2Xs, where the addition symbol stands for disjunction 
and the multiplication for conjunction, is a canalizing function, since setting 
Xi to 1 guarantees that the function value is 1 regardless of the value of X2 or 
X3. On the other hand, the function / (xi, 0:2) = © X2, where © is addition 
modulo 2, is not a canalizing function, since the values of both variables always 
need to be known in order to determine the function output. 

Canalizing functions have been implicated in a number of phenomena related 
to discrete dynamical systems as well as nonlinear filters. Concerning the lat- 
ter, they have been used to study the convergence behavior of an important 
class of nonlinear digital filters called stack filters [1,2,3]. For example, stack 
filters defined by canalizing functions are known to possess a convergence prop- 
erty whereby a filter is guaranteed to converge to a so-called root signal or 
fixed point of the filter after a finite number of passes [2] . In [4] , some learning 
schemes were proposed to find minimal filters defined by canalizing functions. 

Canalizing functions also play an important role in the study of phase transi- 
tions in random Boolean networks [5,6,7,8,9]. Boolean networks have been one 
of the most intensively studied models of discrete dynamical systems and have 
been used to gain insight into the behavior of large genetic networks [5], evo- 
lutionary principles [10,11], and the development of chaos [12,13]. Although 
structurally simple, these systems are capable of displaying a remarkably rich 
variety of complex behavior. Canalizing functions represent one of the few 
known mechanisms capable of preventing chaotic behavior in Boolean net- 
works [5]. By increasing the percentage of canalizing functions in a Boolean 
network, one can move closer toward the ordered regime and, depending on 
the connectivity and the distribution of the number of canalizing variables, 
cross the phase transition boundary [14]. In fact, there is overwhelming evi- 
dence that canalizing functions are abundantly utilized in higher vertebrate 
gene regulatory systems [5]. A recent large-scale study of the literature on 
transcriptional regulation in eukaryotes demonstrated an overwhelming bias 
towards canalizing rules [15]. Canalization is also a natural mechanism for 
designing robustness against noise [16]. 

Knowledge of the number of possible canalizing functions with a given num- 
ber of input variables is important for determining the degree to which these 
functions are evolutionarily plausible as regulatory rules in genetic networks. 
There are two related issues here. First, a class of functions that is overly 
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limited in size is unlikely to emerge via the mechanism of random selection. 
Thus, when comparing different classes of functions vis-a-vis their likelihood 
of giving rise to regulatory control rules, it is informative to know their respec- 
tive sizes [9]. Second, when gene regulatory rules are inferred from real gene 
expression measurements [17] , it is often beneficial to constrain the inferential 
algorithms to a certain class of functions that can be produced. It may seem 
that imposing a constraint (e.g., restricting all functions to be canalizing) can 
only result in a degradation of the performance of the algorithm, thus yield- 
ing a larger estimation or prediction error relative to an algorithm with no 
imposed constraints. But it turns out that doing so can often improve the 
tractability and precision of the inference. This can be particularly noticeable 
when an inference is made from small sample sizes. In order to quantify the 
reduction in 'design cost' owing to the constraint, it is again informative to 
consider the size of the class of functions used as a constraint. Thus, it is 
an important goal to establish the number of canalizing functions of a given 
number of input variables. 



Of course, one approach is to generate all Boolean functions with n variables 
and check whether each one is canalizing. However, despite efficient methods 
to test the canalizing property [18], this approach becomes prohibitive for 
large values of n and the exact number has only been known for n < 5 [9] . It 
has also been known that the number of canalizing functions with n variables 
is upper bounded by 4n • 2^" [19]. 



In this paper, we derive an exact formula for the number of canalizing functions 
with n variables. In addition, we also derive a formula for the probability that 
a random Boolean function whose truth table is a Bernoulli(p) random vector 
is canalizing. The latter is important because the 'bias' p of Boolean functions 
also plays a crucial role in the order-disorder transition in Boolean networks 
and it is known that canalizing functions are likely to be biased, meaning that 
they are expected to have a large number of ones or zeros in their truth tables 
[8,9]. Since a canalizing function can have one or more canalizing variables, 
we also consider the number and probability of Boolean functions that are 
canalizing for exactly k variables. This is also a relevant issue because it is 
known that tuning the number of canalizing inputs in a random Boolean 
network can dramatically affect its dynamical behavior. Moreover, according 
to the formulas derived in our paper, real genetic regulatory rules appear to 
be highly skewed towards large numbers of canalizing inputs [15] relative to 
what should be expected by chance in a canalizing function. 
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2 The probability of canalizing functions 

Throughout this paper, let n > 1 be a fixed positive integer. For each positive 
integer k, the set {0, 1, . . . , A; — 1} will be denoted by [k]. The cardinality of 
a set A will be denoted by \A\. Let < p < 1. We will consider the following 
probability measure on the space of all Boolean functions: 

We call Prp{f) the probability of / for bias p. 

Recall that a Boolean function / is canahzing if there exist i E n (called a 
canalizing variable) and s,v e {0,1} such that: 

\fxe{0,ir{x, = s^f{x,)=v). (1) 

If f = 1, then we will say that / is positively canalizing; if v = 0, then we will 
say that / is negatively canalizing. 

Let C be the set of all canalizing Boolean functions; let PC be the set of all 
positively canalizing Boolean functions, let NC be the set of all negatively 
canalizing Boolean functions, and let BC be the set of Boolean functions that 
are both positively and negatively canalizing. 

Our goal in this section is to calculate Prp{C). It is clear that 

Prp{C) = Prp{PC) + Prp{NC) - Prp{BC). (2) 

Let us first dispose of the easy part and calculate Prp{BC). Note that it cannot 
be the case that a Boolean function / is positively canalizing for a canalizing 
variable Xi and negatively canalizing for a different canalizing variable xj ^ Xj. 
Thus for every / G BC there exists a unique canalizing variable Xi{f), and we 
either have 

Wx e {0, = ^ f{xi) = 0) k{xi = 1 ^ f{xi) = 1), 

or we have 

vx e {0, iy(xi = ^ f(xi) = 1) = 1 ^ f(xi) = 0). 

Thus \BC\ — 2n; and for every function / e BC we have Prp{f) — p^" ^(1 — 
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pY" \ It follows that 

Prp{BC) = 2n/"-' (1 - p)^"-^ . (3) 

In our calculations of Prp{PC) and Prp{NC) it will be convenient to work 
only with nonconstant canalizing functions. Let PC^ = PC\{1} and NC^ = 
NC\{0}, where 1,0 are the Boolean functions that take the value 1 respec- 
tively everywhere. In this terminology, equation (2) is equivalent to: 

Prp(C) = Prp(PC-) + Prp(7VC-)+p2" + (l-pr-2V"(l-p)2""\ (4) 

Now we need to be a little more specific about the number of variables for 
which a function is canalizing. 

Definition 1 Let f : {0, 1}" — > {0, 1} and let I be a nonempty subset of [n]. 
We say that f is positively canalizing on I if there exists a function a : I ^ 
{0, 1} called a signature of f on I such that 

Vx e {0, 1}" elx^y^ a{i)) ^ f{xi) = 1). (5) 

The notion of being negatively canalizing on I is defined analogously. The 
set of all nonconstant Boolean functions that are positively canalizing on a 
given index set I will be denoted by PCj ; the set of all nonconstant Boolean 
functions that are negatively canalizing on a given index set I will be denoted 
by NCJ. 

Fact 1 Let f e PC J or f & NCJ . Then there exists exactly one signature 
for f on I. 

Proof. Without loss of generality suppose / G PC]~ , and assume towards 
a contradiction that cr,r : / ^ {0, 1} are two different signatures for /. Let 
i E I he such that a{i) ^ t(z). Then for every x G {0, 1}" we have Xi 7^ a^i) or 
Xi ^ T{i), and it follows from equation (5) that f{x) ~ 1. Thus / = 1, which 
contradicts the assumption that / G PCj . ■ 

If / G PC J or / G NCf, then we let ajj denote the unique signature of / 
on /. 

Lemma 1 Let I be a nonempty subset of [n] . Then PC J' — Hie/ ^^{i} ^'^'^ 
NCT^HieiNC^y 

Proof. Suppose / G PC7 and z G /. It is easy to see that the restriction 
of ajj to {i} is a signature for / on {i}, and thus / G PCu\. Now suppose 
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/ e PC{i} for all i e /. Let a = \JW{i}j '■ i & I}- Then u is a signature for / 
on /, and it follows that / e PCf. 

The proof of the second equation is analogous. ■ 

It follows from the definition of canalizing functions that 

PC- = U PC^y, NC- = U NC^^ (6) 

i<n i<n 

Unfortunately, the sets PC^iy are not pairwise disjoint. So we have to use the 
Inclusion-Exclusion Principle to calculate the probability of the union of these 
sets. This gives: 



Pr,{PC-) = Pr,{PC^y) - E Pr,{PC^,^y n PC^.^y) + ... 

0<i<n 0<ii<i2<n 

+ (-1)'^+' E PrpiPC^,^} n pq-:^^ n . . . n PC^^J + .... 

0<ii<«2<---<ifc<n 



By Lemma 1, equation (7) can be written as: 



Pr,{PC-) = E (8) 

k=l 0<ii<i2<---<ik<n 

Since for |/| = \J\ we obviously have Prp{PCj) — Prp{PCj), we can rewrite 
equation (8) as follows: 



Prp(PC-) = E(-l)'^'y (9) 
The analogous reasoning shows that 



Prp(NC-) = i:{-ir\l]Prp(NC[,^). (10) 



Now it remains to compute Prp{PC^,^^) and Prp{NC^^). 
Lemma 2 Let 1 < k < n. Then 



Pr,(Pq^j)=2'=(p---- (11) 
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(12) 



Proof. We prove equation (11); the proof of equation (12) is analogous. By 
Fact 1 we have 



Pr,{PC[,^)^ E Pr,(/ePq^]&<7[,]j = a). (13) 

o-e{o,i}" 

It is clear that for any a, t : [k] ^ {0, 1} we have Prp{f G PC^k] ^ ^[k],f — 
a) — Prp{f e P^\k] ^ '^[k]J = ''")• Pi^^ arbitrary a* : [k] {0,1}. 
Equation (13) now implies: 



Pr,(Pq-]) = 2'=Pr,(/ e PC[-j & a[,y = a*). (14) 

Let us calculate Prp{f e -PC*!^] = '^*)- If / ^ -^^[fc] '^[^1,/ ~ '^*' 

then f{x) = l whenever the restriction of x to the first k variables is not equal 
to a*. So there are 2"~^ arguments x of / on which / can take arbitrary values 
(except taking value 1 everywhere), and 2" — 2"~^ arguments x where / has 
to take value 1. In other words. 



Prpiif e Pq-^] & cj[k],f = a*) V / = 1) = (15) 

Since Prp(l) =p^", equation (15) implies 



Fr,(/ e Pq^j & = a*) = - (16) 

This in turn implies equation (11). ■ 
Now let us put all our formulas together. We get: 

Prpio = + (1 - pr - ^^nr' (1 - pt-\ 

p{-l f+' Q 2'^(/"-2"-'= + (1 _ ^)2"-2"-'= _ p2 
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Note that 



E(-ir^Q2'(V"-(i-pr) = 
+ {i-prm - '^r - 1) = + {i-pDh-^t - 1)- 

Thus equation (17) simphfies to: 

Pr,{C) = + {I-pD - 2np'"-\l - pr-\ 



3 The number of canalizing functions 

Equation (19) allows us to derive a formula for the number of canalizing 
functions as follows. Set p — 0.5. Then all functions have equal probabihty, 
and we simply can compute: 

|C|=Pro.5(C)22" = 

2-2(((-l)" - n)2-- + ti-^r' ^2^=2--+--^) = ^^^^ 

2((-l)" - n) + E(-l)'^' Q 2'+'22"-\ 

The values of |C| for n = 1, . . . , 10 are shown in Table 1. 

It is interesting to note that for large n the value of \C\ given by equation (20) 
asymptotically approaches the upper bound of An ■ 2^" given in [19]. To see 
this, let Sk = (^)2'=+i22""' for 1 < A; < n. Then 

|C| = 2((-l)"-n) + X:(-l)'+'5.- (21) 

k=l 



For sufficiently large n, the first term becomes negligible, and we can concen- 
trate on the asymptotic behavior of iS = Y^^=i{~^Y^^ Sk- Moreover, it is not 
hard to see that Sk > Sk+i for all k < n. Thus the partial sums of 5" with an 
odd number of terms form an upper bound for S, while the partial sums with 
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an even number of terms form a lower bound. In particular, for the first and 
second partial sums we have the inequalities: 



Si-S2<S< Si. (22) 

Dividing by S'l we obtain 



1-|<|<1- (23) 

As n approaches infinity, |^ approaches zero, and therefore, S is asymptotic 
to Si. Now it suffices to note that Si = (^fj2^+'^2'^'^~' = 4n • 2'^"'' is exactly 
the upper bound given in [19]. 



4 Functions that cire canalizing for k VEiriables 



By definition, a Boolean function is canalizing if and only if it is canalizing 
for at least one variable. How can we compute the number and probability 

of Boolean functions that are canalizing for exactly k variables? To solve this 
problem, we need a generalization of the Inclusion-Exclusion Principle. The 
following lemma appears as Corollary 5B.4 in 



Lemma 3 Let /o, •••/«-! be real-valued functions with a common domain, 
and let u be the function that is identically 1 on the common domain of the 
fi 's. Let / C [n], and let P = [n]\I. Then 

n/^n(^-/o = E(-i)"''"'"n^ (24) 

iei iei" R^i ieR 



Now suppose that Eq, . . . , En-i are events in a fixed probability space 1], and 
that fi is the characteristic function of on ^2 for z = 0, . . . , n — 1. Then we 
have for / C [n]: 

Pr{\[f.^l)^Pr{[^Eil 

iei iei 



Pr{\{f.\{{u-fi)^l)^Pr{{^E,r^ n El 

iei iei'' iei iei'^ 

and equation (24) translates into: 



Pr{f]E,n n = E(-l)"""'"^Kn^O- (25) 
iei iei" R2I ieR 



9 



Now let Eq, . . . , En-i and / be as above. Following Definition 5B.5 of [20] we 
define: 



IN{I) = {u} en-, u; e Ei^i e I}, 

and for k < n: 

IN{k) = U /A^(/). 

\I\=k 

The following lemma is a straightforward generalization of Corollary 5B.6 
of [20]: 

Lemma 4 In the terminology introduced above we have: 



Pr{IN{I)) = ^(-l)l«l-mpr(n E,). (26) 

RDI ieR 



Pr((7Ar(/c)) = E J(-ir' E Prif]E,). (27) 

r=k V^/ RC[n],\R\=r ieR 

Let us apply equation (27) to the situation where fl is the space of all Boolean 
functions of n variables with probability function Prp defined above. For 1 < 
k <n, let Prp{PCEk) denote the probabihty that a randomly chosen Boolean 
function with bias p is positively (but not negatively) canalizing on exactly k 
variables, and let Prp{NCEk) denote the probability that a randomly chosen 
Boolean function with bias p is negatively (but not positively) canalizing on 
exactly k variables. We will compute Prp{PCEk). For i < n, let Ei = PC^y 
Note that for this choice of E^ and k < n, IN{k) is the set of functions that 
are positively canalizing for exactly k variables; and the set of functions that 
are positively canalizing for n variables is IN{n) U {1}. In view of Lemma 1, 
equation (27) now boils down to the following: 



Pr,((JiV(fc)) = E L K-ir' E Prv^PCn). (28) 

r=k W RCXn],\R\=r 



Since the probability of -PC^ depends only on \R\, we get 



Prp{{IN{k)) = ± \\-^r'\JPrAPC[r{)- (29) 



r=k 
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It follows from Lemma 2 that for A; > 1 we have: 



r=k 

n 



(30) 



r=k 



Thus it follows that for 1 < A; < n we have: 



Pr,{PCE,) = E (;;)(-l)'-'=(;)2^(/"-^"-^ (31) 
A similar argument shows that for 1 < A; < n we have: 



Pr,{NCE,) = E Q(-l)^-'=(^j2'^((l -P)""^ - (I-pD- (32) 



For 1 < A; = n we need to add the two constant functions: 



Pr.iPCE,,) = 2"(p^"-i -p2") (33) 



Pr,{NCE^) = 2"((1 -p)2"-i + (l-pf\ (34) 

For 1 = A; < n we need to subtract the probability that the function is 
canalizing both ways. This gives: 



PrpiPCE,) = n{Prp{PC[,^) - 2p'"-\l - pf-) + 

|:r(-i)-^(;:)(Pr,(pq;j)) = 



(35) 



271—1 2^ 2^^~ "'" / 1 \ 



r=2 

2n{p'"-' -p'" -p'"'\l-py"'') + 



Y.ri-iy-'rh^ip"-" -p'"). 



r=2 
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Prp{NCE,) = n(Prp(7VC[7]) - 2p^"'\l - pf"'" )+ 

2n{{i-pr-^"'' - {i-pf" -p'"-\i-pr-')+ 
±r{-iy-'l^^n(i-pr-^- - (i-pd = ^^^^ 
Mii-pf'-' - {i-pr -p'"-\i-pr-')+ 

Let c{k) denote the number of functions that are canaUzing for exactly k 
variables. We have: 

c{k) = (Pro.^iPCEk) + Pro.5(iVCSfe))22" ii 1< k < n, (37) 



c{k) = 2 + {Pro,5{PCEk) + Pro.5(iVCEfc))22" if 1< A; = n, (38) 



c{k) = {Pro.5{PCEi) + Pro.5{NCEi) + Pro.5{BC))2^'' if 1 = A; < n, (39) 



c{k) = 2 + {Pro5{PCEi) + Pro5iNCEi) + Pro 5{BC))2^'' if 1 = = n. 

(40) 

This imphes the following formulas for c{k): 
For 1 < k < n: 



^W = E(j(-irM J2^^^(2^""^-l)- (41) 



r=k 



For 1 < k — n: 



c{k)^2 + 2''+\ (42) 
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For 1 — k < n: 



c(l) = 2n(2^^--^ - 3) + ± (-1)^- (")2^^^(2--^ - 1). (43) 
For 1 = k — n: 

c(l) = 2 + 2 •1(2^+21-1 _ 3) ^0 = 4. (44) 



5 Randomly generating canalizing functions 

In simulating the behavior of random Boolean networks, it is important to be 
able to randomly generate canaUzing functions with a given bias p [9]. Our 
results in Section 4 allow us to do so by means of the following algorithm: 

Recall that for 1 < /c < n, Prp{PCEk) denotes the probabihty that a ran- 
domly chosen Boolean function with bias p is positively (but not negatively) 
canahzing on exactly k variables, and that Prp{NCEk) denotes the probabil- 
ity that a randomly chosen Boolean function with bias p is negatively (but 
not positively) canalizing on exactly k variables. Here is the algorithm. 

Algorithm CanalizingFunctionGenerator(p ) 

Let q — Q with probabihty ^^r^^] ior 1 < k < lei q — k with 
probability ^''''^^^Xcc?^'''''''^ "^^ 

if g > 1 then let r = 1 with probability {PCEk)+Pr''{NCEk) 
r = with probabihty p,^(pcg)+??(W) " 

if q == then 

Randomly pick an input variable Xi. 

Randomly pick one of the two functions that are canalizing both ways 
on input Xj. 

return the function / that was just picked. 

else if r == 1 then 

Randomly pick a subset S of [n] of size q. 

Randomly pick a function s : 5 — > {0, 1}. 

For each input vector x that contains some Xi with Xi — s{i) let 
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fix) = 1. 



repeat 

For each of the remaining input vectors x let independently and 
randomly f{x) = 1 with probability p and let f{x) = with 
probability 1 — p. 

until the resulting function / is in PCEq. 

return /. 

else // r 

Randomly pick a subset S of [n] of size q. 

Randomly pick a function s : S {0, 1}. 

For each input vector x that contains some Xi with Xi = s{i) let 
fix) = 0. 

repeat 

For each of the remaining input vectors x let independently and 
randomly f{x) — 1 with probability p and let fix) — with 
probability 1 — p. 

until the resulting function / is in NCEq. 

return /. 

Note that the repeat . . . until loops in this algorithm are necessary since when 
parts of the vectors x are assigned randomly, the resulting function might, by 
chance, become canalizing for more than q canalizing variables. Should this 
occur, we would need to throw the function away and generate another one. 
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n \C\ 

1 4 

2 14 

3 120 

4 3514 

5 1292 276 

6 103 071 426 294 

7 516 508 833 342 349371376 

8 10 889 035 741 470 030 826 695 916 769 153 787 968 498 

9 4.168 515 213 x 10^^ 

10 5. 363 123 172 x lO^^^ 

Table 1 

The number of canalizing functions with n input variables. 
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